Health insurance fraud has become a serious issue due to the increasing number of digital claims and the involvement of multiple entities such as hospitals, doctors, and patients. Traditional methods often analyze claims individually and fail to capture hidden relationships that may indicate collusion. In this work, a hybrid approach is proposed to improve fraud detection by combining graph-based feature engineering with machine learning techniques. A network is constructed to represent interactions between patients, physicians, and hospitals, from which relational features such as connectivity and visit frequency are derived. In addition, an Isolation Forest model is used to identify unusual financial patterns in claims. These features are then used to train an XGBoost classifier to distinguish between genuine and fraudulent claims. The system also incorporates SHAP-based explanations to provide transparency in predictions. The results indicate that incorporating relational and anomaly-based features improves detection performance compared to traditional models. The proposed approach offers a practical and interpretable solution for identifying both individual and coordinated fraud in healthcare insurance systems.
Introduction
Health insurance fraud is a growing problem due to the rise of digital claim processing, leading to activities such as false billing, upcoding, and collusion between patients, doctors, and hospitals. Traditional fraud detection methods—mainly rule-based or basic machine learning—focus on individual claims and fail to detect complex, multi-entity fraud patterns.
To address these limitations, this project proposes a hybrid fraud detection system that combines:
Graph-based analysis to capture relationships between patients, physicians, and hospitals
Anomaly detection (Isolation Forest) to identify unusual financial behavior
Machine learning (XGBoost) for accurate classification of fraudulent vs genuine claims
Explainable AI (XAI) to ensure transparency and interpretability
The system models claims as a network (graph) where entities are nodes and interactions are edges. From this, structural features (e.g., connectivity patterns) are extracted to detect collusion. Financial and behavioral features (e.g., claim amount, length of stay, cost per day) are also engineered to improve detection accuracy.
The methodology includes:
Creating a realistic dataset with both genuine and fraudulent claims
Feature engineering (financial, temporal, relational, and anomaly-based)
Applying anomaly detection to generate additional insights
Training and evaluating an XGBoost model using metrics like precision, recall, F1-score, and ROC-AUC
Key contributions of the approach:
Detects both individual and collusive fraud
Combines relational and financial analysis
Balances accuracy, efficiency, and interpretability
Supports real-time fraud detection
The study highlights that modern fraud detection must move beyond isolated claim analysis and incorporate network-based and data-driven techniques to handle evolving fraud patterns effectively.
Conclusion
This work examined the limitations of existing techniques for detecting health-insurance fraud, particularly in cases involving coordinated and relational behaviours that are not visible at the level of individual claims. Traditional rule-based systems and standalone supervised models struggle with severe class imbalance, evolving fraud patterns and the absence of interpretability required for investigative decision-making. To address these gaps, we proposed a lightweight graph-augmented XGBoost framework that combines tabular claim features with graph-derived relational indicators and anomaly-detection signals. The integration of SHAP explanations ensures transparency, while adaptive retraining supports long-term resilience against drift. Experimental results demonstrate improved fraud-detection performance over baseline models without demanding high computational resources.
Overall, the framework provides a practical, explainable and scalable approach to fraud detection and represents a step toward operational systems that can analyse claims collectively rather than in isolation. Future extensions may include federated collaboration across insurers and enhanced support for unstructured clinical information.
References
[1] J. Wang et al., \"A robust and interpretable ensemble ML model for predicting healthcare insurance fraud,\" Expert Syst. Appl., vol. 250, p. 124567, Aug. 2025.
[2] J. De Meulemeester et al., \"Explainable unsupervised anomaly detection for healthcare insurance data,\" BMC Med. Inform. Decis. Mak., vol. 25, no. 1, p. 823, 2025.
[3] A. Matloob et al., \"Healthcare fraud detection using adaptive and deep learning,\" Neural Comput. Appl., vol. 37, no. 12, pp. 9698-9710, 2025.
[4] J. Hassan and M. Alam, \"Automating healthcare claims with supervised and unsupervised AI,\" Appl. Sci., vol. 14, no. 5, p. 2100, Mar. 2024.
[5] S. Chitteti, P. P. Shenoy, and P. K. Vidhate, \"Healthcare insurance fraud detection using ML,\" SN Comput. Sci., vol. 6, no. 3, p. 456, May 2025.
[6] K. Razzaq, M. Shah, and A. Alghamdi, \"AI techniques for healthcare fraud detection: A survey,\" Information, vol. 16, no. 9, p. 730, Sep. 2025.
[7] A. Sharma, \"ML approaches for health insurance fraud detection,\" Int. J. Innov. Sci. Res. Technol., vol. 9, no. 6, pp. 1234-1245, Jun. 2024.
[8] J. De Meulemeester et al., \"Explainable anomaly workflow for healthcare insurance,\" BMC Med. Inform. Decis. Mak., vol. 25, no. 1, p. 824, 2025.
[9] H. Li et al., \"Health insurance fraud detection based on multi-channel heterogeneous graph structured learning,\" Appl. Soft Comput., vol. 150, p. 111616, Oct. 2024.
[10] M. Nabrawi et al., \"Fraud detection in healthcare insurance claims using machine learning,\" Risks, vol. 11, no. 9, p. 160, Sep. 2023.
[11] A. Alghamdi et al., \"Autoencoder-based fraud detection for insurance claims,\" IEEE Access, vol. 10, pp. 56789-56800, 2022.
[12] R. Kumar et al., \"CNN-LSTM + modified SHA-256 for secure healthcare fraud detection,\" Wireless Pers. Commun., vol. 128, no. 2, pp. 234-250, Jan. 2023.
[13] A. Alghamdi et al., \"Next-generation machine learning in healthcare fraud detection,\" Information, vol. 16, no. 9, p. 730, Sep. 2025.
[14] A. Sharma, \"Predictive accuracy of ML models in fraud detection for health insurance in India,\" Amer. J. Soc. Sci. Admin. Stud., vol. 3, no. 2, p. 2253, 2024.
[15] Insurance Regulatory and Development Authority of India (IRDAI), \"Annual Report on Health Insurance Fraud and Abuse,\" IRDAI, Hyderabad, India, 2025.
[16] Federation of Indian Chambers of Commerce & Industry (FICCI), \"Working Paper on Health Insurance Abuse and Fraud Management,\" FICCI Sub-Group on Health Insurance Fraud, New Delhi, India, 2025.
[17] BCG and Medi Assist, \"Rebuilding Trust in India\'s Health Insurance Ecosystem: Tackling Fraud and Abuse,\" BCG Report, Mumbai, India, Nov. 2025.
[18] Centers for Medicare & Medicaid Services (CMS), \"Improper Payments in Medicare and Medicaid Programs,\" U.S. Dept. Health Human Services, Washington, DC, USA, 2024.
[19] C. Phua, V. Lee, K. Smith, and R. Gayler, \"A comprehensive survey of data mining-based fraud detection research,\" Artif. Intell. Rev., vol. 33, no. 1, pp. 1-30, 2010 (updated 2025 ed.).
[20] J. N. Trivedi and R. Vagadiya, \"Artificial intelligence in the detection and prevention of insurance fraud in India,\" Int. Educ. J. Soc. Sci. Educ., vol. 1, no. 1, p. 165, 2025.